Course set solutions – thematic course on income - tax - welfare benefits
The script below demonstrates the possibilities of using different types of income data and social security data to map people's status on the labor market. This was reviewed in our theme course, which was run twice in 2022. Click here for more about our courses.
require no.ssb.fdb:23 as db
textblock
Income Data
------------
endblock
textblock
Two types of income data: SSB's income statistics (INNTEKT_), and A-arrangement (ARBLONN_LONN_)
endblock
create-dataset salary_income1
import db/INNTEKT_LONN 2020-12-31 as salary
import db/INNTEKT_WLONN 2020-12-31 as wsalary, outer_join
import db/INNTEKT_YRKINNT 2020-12-31 as occupational_income, outer_join
import db/INNTEKT_WYRKINNT 2020-12-31 as woccupational_income, outer_join
import db/INNTEKT_PGIVINNT 2020-12-31 as pension_giving_income, outer_join
textblock
Salary and occupational income from SSB's income statistics (occupational income includes net business income):
endblock
summarize
barchart(mean) salary wsalary occupational_income woccupational_income pension_giving_income
create-dataset salary_income2
import db/ARBLONN_LONN_TIME 2022-04-30 as arblonn_hourly_wage2204
import db/ARBLONN_LONN_TIME 2022-03-30 as arblonn_hourly_wage2203, outer_join
import db/ARBLONN_LONN_TIME 2022-02-28 as arblonn_hourly_wage2202, outer_join
import db/ARBLONN_LONN_TIME 2022-01-30 as arblonn_hourly_wage2201, outer_join
import db/ARBLONN_LONN_TIME 2021-12-30 as arblonn_hourly_wage2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Monthly salary for people with agreed hourly wage
endblock
summarize
create-dataset salary_income3
import db/ARBLONN_LONN_TIME_INNRAPP 2022-04-30 as arblonn_hourly_wage2204
import db/ARBLONN_LONN_TIME_INNRAPP 2022-03-30 as arblonn_hourly_wage2203, outer_join
import db/ARBLONN_LONN_TIME_INNRAPP 2022-02-28 as arblonn_hourly_wage2202, outer_join
import db/ARBLONN_LONN_TIME_INNRAPP 2022-01-30 as arblonn_hourly_wage2201, outer_join
import db/ARBLONN_LONN_TIME_INNRAPP 2021-12-30 as arblonn_hourly_wage2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Monthly salary for people with agreed hourly wage, reported/unprocessed
endblock
summarize
create-dataset salary_income4
import db/ARBLONN_LONN_FAST 2022-04-30 as arblonn_fixed_salary2204
import db/ARBLONN_LONN_FAST 2022-03-30 as arblonn_fixed_salary2203, outer_join
import db/ARBLONN_LONN_FAST 2022-02-28 as arblonn_fixed_salary2202, outer_join
import db/ARBLONN_LONN_FAST 2022-01-30 as arblonn_fixed_salary2201, outer_join
import db/ARBLONN_LONN_FAST 2021-12-30 as arblonn_fixed_salary2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Fixed salary (agreed fixed salary incl. honorarium, piecework, percent and commission)
endblock
summarize
import db/ARBEIDSFORHOLD_PERSON as personid
create-dataset persons4
import db/ARBLONN_PERS_KJOENN 2022-03-16 as gender
merge gender into salary_income4 on personid
use salary_income4
textblock
Gender differences: On average, women have 19% lower salary (21% without SDC)
endblock
tabulate gender, summarize(arblonn_fixed_salary2203)
barchart(mean) arblonn_fixed_salary2203, over(gender)
create-dataset salary_income5
import db/ARBLONN_LONN_FAST_INNRAPP 2022-04-30 as arblonn_fixed_salary2204
import db/ARBLONN_LONN_FAST_INNRAPP 2022-03-30 as arblonn_fixed_salary2203, outer_join
import db/ARBLONN_LONN_FAST_INNRAPP 2022-02-28 as arblonn_fixed_salary2202, outer_join
import db/ARBLONN_LONN_FAST_INNRAPP 2022-01-30 as arblonn_fixed_salary2201, outer_join
import db/ARBLONN_LONN_FAST_INNRAPP 2021-12-30 as arblonn_fixed_salary2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Fixed salary, reported/unprocessed
endblock
summarize
import db/ARBEIDSFORHOLD_PERSON as personid
create-dataset persons5
import db/ARBLONN_PERS_KJOENN 2022-03-16 as gender
merge gender into salary_income5 on personid
use salary_income5
textblock
Gender differences: On average, women have 20% lower salary (21% without SDC)
endblock
tabulate gender, summarize(arblonn_fixed_salary2203)
barchart(mean) arblonn_fixed_salary2203, over(gender)
create-dataset salary_income6
import db/ARBLONN_LONN_EKV_IALT 2022-03-30 as arblonn_hekv_salary2203
import db/ARBLONN_LONN_EKV_IALT 2022-02-28 as arblonn_hekv_salary2202, outer_join
import db/ARBLONN_LONN_EKV_IALT 2022-01-30 as arblonn_hekv_salary2201, outer_join
import db/ARBLONN_LONN_EKV_IALT 2021-12-30 as arblonn_hekv_salary2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Full-time equivalent total salary (incl. supplements and bonuses, but not overtime compensation)
endblock
summarize
import db/ARBEIDSFORHOLD_PERSON as personid
create-dataset persons6
import db/ARBLONN_PERS_KJOENN 2022-03-16 as gender
merge gender into salary_income6 on personid
use salary_income6
textblock
Gender differences: On average, women have 12% lower salary (13% without SDC)
endblock
tabulate gender, summarize(arblonn_hekv_salary2203)
barchart(mean) arblonn_hekv_salary2203, over(gender)
create-dataset salary_income7
import db/ARBLONN_LONN_EKV_FMLONN 2022-03-30 as arblonn_hekv_agreed_salary2203
import db/ARBLONN_LONN_EKV_FMLONN 2022-02-28 as arblonn_hekv_agreed_salary2202, outer_join
import db/ARBLONN_LONN_EKV_FMLONN 2022-01-30 as arblonn_hekv_agreed_salary2201, outer_join
import db/ARBLONN_LONN_EKV_FMLONN 2021-12-30 as arblonn_hekv_agreed_salary2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Full-time equivalent agreed salary
endblock
summarize
import db/ARBEIDSFORHOLD_PERSON as personid
create-dataset persons7
import db/ARBLONN_PERS_KJOENN 2022-03-16 as gender
merge gender into salary_income7 on personid
use salary_income7
textblock
Gender differences: On average, women have 12% lower salary (12% without SDC)
endblock
tabulate gender, summarize(arblonn_hekv_agreed_salary2203)
barchart(mean) arblonn_hekv_agreed_salary2203, over(gender)
create-dataset salary_income8
import db/ARBLONN_LONN_KONTANT_IMP 2022-04-30 as arblonn_cash_salary2204
import db/ARBLONN_LONN_KONTANT_IMP 2022-03-30 as arblonn_cash_salary2203, outer_join
import db/ARBLONN_LONN_KONTANT_IMP 2022-02-28 as arblonn_cash_salary2202, outer_join
import db/ARBLONN_LONN_KONTANT_IMP 2022-01-30 as arblonn_cash_salary2201, outer_join
import db/ARBLONN_LONN_KONTANT_IMP 2021-12-30 as arblonn_cash_salary2112, outer_join
textblock
Salary statistics from the A-arrangement (unit = jobs, population = employees): Cash salary (salary incl. supplements, bonuses, overtime compensations, severance pay)
endblock
summarize
import db/ARBEIDSFORHOLD_PERSON as personid
create-dataset persons8
import db/ARBLONN_PERS_KJOENN 2022-03-16 as gender
merge gender into salary_income8 on personid
use salary_income8
textblock
Gender differences: On average, women have 25% lower salary (28% without SDC)
endblock
tabulate gender, summarize(arblonn_cash_salary2203)
barchart(mean) arblonn_cash_salary2203, over(gender)
textblock
Two types of total income data: SSB's income statistics (INNTEKT_) and SSB's tax statistics (SKATT_)
endblock
create-dataset total_income
import db/INNTEKT_WSAMINNT 2020-12-31 as total_income
import db/INNTEKT_WIES 2020-12-31 as income_after_tax, outer_join
import db/INNTEKT_UTSKATT 2020-12-31 as tax, outer_join
import db/INNTEKT_WSKATOVF 2020-12-31 as tax_and_transfers, outer_join
generate calculated_tax = total_income - income_after_tax
import db/SKATT_ALMINNELIG_INNTEKT 2020-12-31 as ordinary_income, outer_join
import db/SKATT_BRUTTOINNTEKT 2020-12-31 as gross_income, outer_join
generate calculated_tax2 = gross_income - ordinary_income
textblock
Total income, gross and net:
endblock
summarize
textblock
There is also data on social security income from SSB's income statistics that can be used to map social security recipients
endblock
create-dataset social_security_income
import db/INNTEKT_BARNETRYGD 2020-12-31 as child_benefit
import db/INNTEKT_KONTANTSTOTTE 2020-12-31 as cash_support, outer_join
import db/INNTEKT_FORELDREPENGER 2020-12-31 as parental_benefit, outer_join
import db/INNTEKT_SYKEPENGER 2020-12-31 as sick_pay, outer_join
import db/INNTEKT_ARBLED 2020-12-31 as unemployment_benefit, outer_join
import db/INNTEKT_SUM_ARBAVKL 2020-12-31 as work_assessment_allowance, outer_join
import db/INNTEKT_KODE218 2020-12-31 as disability_benefit, outer_join
import db/INNTEKT_GRUNN_HJELP 2020-12-31 as basic_and_assistance_benefit, outer_join
import db/INNTEKT_BOSTOTTE 2020-12-31 as housing_allowance, outer_join
import db/INNTEKT_SOSIAL 2020-12-31 as social_assistance, outer_join
import db/INNTEKT_STUDIESTIPEND 2020-12-31 as study_grant, outer_join
import db/INNTEKT_OVERFOR 2020-12-31 as transfers, outer_join
import db/INNTEKT_WOVERFOR 2020-12-31 as wtransfers, outer_join
textblock
Social security income from SSB's income statistics (excluding pensions):
endblock
summarize
barchart(mean) child_benefit cash_support parental_benefit sick_pay unemployment_benefit work_assessment_allowance disability_benefit basic_and_assistance_benefit housing_allowance social_assistance study_grant transfers wtransfers
barchart(count) child_benefit cash_support parental_benefit sick_pay unemployment_benefit work_assessment_allowance disability_benefit basic_and_assistance_benefit housing_allowance social_assistance study_grant transfers wtransfers
textblock
SDC effect on average figures:
Average figures consistently become lower because extreme values are top-coded. The effect varies depending on what the value distributions look like:
- Right-skewed distribution => larger effect
- Normally distributed or left-skewed distribution => smaller effect
- Salary and occupational income (annual amount): -1.7 - -2.2%
- Salary and occupational income (monthly amount): -0.5 - -1.8%
-- exception: cash salary: -1.5 - -4.1%
- Total income and tax (annual amount): -2.9 - -6.6%
- Social security (annual amount): -0.1 - -1.2%
- Household income after tax (annual amount): -2.1 - -2.2%
Monthly amounts and social security have less spread => less deviation
Total incomes have larger spread => larger deviation
Gender differences become somewhat lower because men more often have extreme values that are top-coded
endblock
textblock
Net total income for households is also available
endblock
create-dataset household_income
import db/INNTEKT_HUSH_IES 2020-12-31 as household_income_after_tax
import db/INNTEKT_HUSH_IES_EU 2020-12-31 as household_income_after_tax_per_consumption_unit, outer_join
textblock
Household income from SSB's income statistics:
endblock
summarize
textblock
Social Security Data
------------
There are several ways to map job, education and social security status:
a) Annual income data related to job, social security or other support schemes
b) Social security status etc. from FD-Trygd, course data from NUDB, and job status from REGSYS or ARBLONN
Below we demonstrate the different approaches
endblock
textblock
a) Income data as indicators for labor market and social security status
endblock
create-dataset transitions
import db/BEFOLKNING_STATUSKODE 2016-01-01 as regstat1
import db/BEFOLKNING_STATUSKODE 2021-01-01 as regstat2
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_date
generate age = 2015 - int(birth_date/100)
keep if regstat1 == '1' & age > 25
import db/NUDB_AAR_FORSTE_FULLF_HOV as completed_master
generate master = 0
replace master = 1 if int(completed_master/100) == 2015
import db/INNTEKT_WYRKINNT 2015-12-31 as wyrkinnt15
import db/INNTEKT_ARBLED 2015-12-31 as unemployment_benefit15
import db/INNTEKT_SUM_ARBAVKL 2015-12-31 as work_assessment_allowance15
import db/INNTEKT_KODE218 2015-12-31 as disability_benefit15
import db/INNTEKT_BOSTOTTE 2015-12-31 as housing_allowance15
import db/INNTEKT_SOSIAL 2015-12-31 as social_assistance15
import db/INNTEKT_STUDIESTIPEND 2015-12-31 as scholarship15
import db/INNTEKT_WYRKINNT 2020-12-31 as wyrkinnt20
import db/INNTEKT_ARBLED 2020-12-31 as unemployment_benefit20
import db/INNTEKT_SUM_ARBAVKL 2020-12-31 as work_assessment_allowance20
import db/INNTEKT_KODE218 2020-12-31 as disability_benefit20
import db/INNTEKT_BOSTOTTE 2020-12-31 as housing_allowance20
import db/INNTEKT_SOSIAL 2020-12-31 as social_assistance20
import db/INNTEKT_STUDIESTIPEND 2020-12-31 as scholarship20
summarize
histogram wyrkinnt15
histogram wyrkinnt20
generate status15 = 99
replace status15 = 1 if wyrkinnt15 > 500000
replace status15 = 2 if wyrkinnt15 > 200000 & wyrkinnt15 <= 500000
replace status15 = 3 if wyrkinnt15 > 40000 & wyrkinnt15 <= 200000
replace status15 = 4 if unemployment_benefit15 > 20000
replace status15 = 5 if work_assessment_allowance15 > 20000
replace status15 = 6 if disability_benefit15 > 20000
replace status15 = 7 if housing_allowance15+social_assistance15 > 50000
replace status15 = 8 if scholarship15 > 20000
generate status20 = 99
replace status20 = 9 if regstat2 != '1'
replace status20 = 1 if wyrkinnt20 > 600000
replace status20 = 2 if wyrkinnt20 > 300000 & wyrkinnt20 <= 600000
replace status20 = 3 if wyrkinnt20 > 50000 & wyrkinnt20 <= 300000
replace status20 = 4 if unemployment_benefit20 > 20000
replace status20 = 5 if work_assessment_allowance20 > 20000
replace status20 = 6 if disability_benefit20 > 20000
replace status20 = 7 if housing_allowance20+social_assistance20 > 50000
replace status20 = 8 if scholarship20 > 20000
define-labels status_codes 1 "High income" 2 "Medium income" 3 "Low income" 4 "Unemployment benefits" 5 "Work assessment" 6 Disability 7 Social 8 Student 9 "Dead or emigrated" 99 Other
assign-labels status15 status_codes
assign-labels status20 status_codes
textblock
Resident persons > 25 years in 2015 and status 5 years later. Source: SSB's income statistics and NUDB
endblock
tabulate status20 master, colpct
textblock
Resident persons > 25 years with completed master's degree in 2015 and status 5 years later. Source: SSB's income statistics and NUDB
endblock
sankey master status20 if master
textblock
Resident persons > 25 years without completed master's degree in 2015 and status 5 years later. Source: SSB's income statistics and NUDB
endblock
sankey master status20 if master == 0
textblock
Status for resident persons > 25 years in 2015 and 5 years later. Source: SSB's income statistics and NUDB
endblock
tabulate status15 status20, rowpct
sankey status15 status20
textblock
b) FD-Trygd data etc. as indicators for labor market and social security status
endblock
create-dataset residents
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstat
keep if regstat == '1'
create-dataset employed
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-01-16 as work_hours01
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-02-16 as work_hours02, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-03-16 as work_hours03, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-04-16 as work_hours04, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-05-16 as work_hours05, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-06-16 as work_hours06, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-07-16 as work_hours07, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-08-16 as work_hours08, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-09-16 as work_hours09, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-10-16 as work_hours10, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-11-16 as work_hours11, outer_join
import db/ARBLONN_PERS_SUM_ARBEIDSTID 2018-12-16 as work_hours12, outer_join
generate months = rowvalid(work_hours01, work_hours02, work_hours03, work_hours04, work_hours05, work_hours06, work_hours07, work_hours08, work_hours09, work_hours10, work_hours11, work_hours12)
generate avg_per_month = rowmean(work_hours01, work_hours02, work_hours03, work_hours04, work_hours05, work_hours06, work_hours07, work_hours08, work_hours09, work_hours10, work_hours11, work_hours12)
summarize months avg_per_month
rename months job_months
rename avg_per_month avg_work_hours_per_month
merge job_months avg_work_hours_per_month into residents
create-dataset unemployment
import-event db/ARBSOEK2001FDT_HOVED 2018-01-01 to 2018-12-31 as job_search
replace STOP@job_search = date(2018,12,31) if STOP@job_search > date(2018,12,31)
replace START@job_search = date(2018,01,01) if START@job_search < date(2018,01,01)
generate days = STOP@job_search - START@job_search + 1
collapse(sum) days, by(PERSONID_1)
boxplot days
summarize days
histogram days, width(30)
rename days unemployment_days
merge unemployment_days into residents
create-dataset disability
import-event db/UFOERP2011FDT_GRAD 2018-01-01 to 2018-12-31 as disability
replace STOP@disability = date(2018,12,31) if STOP@disability > date(2018,12,31)
replace START@disability = date(2018,01,01) if START@disability < date(2018,01,01)
generate days = STOP@disability - START@disability + 1
collapse(sum) days, by(PERSONID_1)
boxplot days
summarize days
histogram days
rename days disability_days
merge disability_days into residents
create-dataset social_assistance
import-event db/SOSHJLPFDT_MOTTAK 2018-01-01 to 2018-12-31 as social
replace STOP@social = date(2018,12,31) if STOP@social > date(2018,12,31)
replace START@social = date(2018,01,01) if START@social < date(2018,01,01)
generate days = STOP@social - START@social + 1
collapse(sum) days, by(PERSONID_1)
boxplot days
summarize days
histogram days, width(30)
rename days social_days
merge social_days into residents
create-dataset education
import-event db/NUDB_KURS_NUS 2018-01-01 to 2018-12-31 as studies
create-dataset personid_edu
import db/NUDB_KURS_FNR as personid
merge personid into education
use education
collapse(min) START@studies (max) STOP@studies, by(personid)
replace STOP@studies = date(2018,12,31) if STOP@studies > date(2018,12,31)
replace START@studies = date(2018,01,01) if START@studies < date(2018,01,01)
generate days = STOP@studies - START@studies + 1
boxplot days
summarize days
histogram days, width(30)
rename days edu_days
merge edu_days into residents
use residents
textblock
Status in the labor market over a year. Source: FD-Trygd/NAV and NUDB
Permanent residents at the end of 2018 distributed by labor market status. Number of days during the year. Job: Number of months and average number of hours per week per month. The number of individuals is accumulated over the whole year and will be much higher compared to status at a given date
endblock
summarize